47 research outputs found
Training Priors Predict Text-To-Image Model Performance
Text-to-image models can often generate some relations, i.e., "astronaut
riding horse", but fail to generate other relations composed of the same basic
parts, i.e., "horse riding astronaut". These failures are often taken as
evidence that the models rely on training priors rather than constructing novel
images compositionally. This paper tests this intuition directly on the
stablediffusion 2.1 text-to-image model. By looking at the subject-verb-object
(SVO) triads that form the backbone of these prompts (e.g., "astronaut",
"ride", "horse"), we find that the more often an SVO triad appears in the
training data, the better the model can generate an image aligned with that
triad. Here, by aligned we mean that each of the terms appears in the generated
image in the proper relation to each other. However, this increased frequency
also diminishes how well the model can generate an image aligned with the
flipped triad. For example, if "astronaut riding horse" appears frequently in
the training data, the image for "horse riding astronaut" will tend to be
poorly aligned. We also find that models often struggle to generate terms in
atypical roles, e.g., if "horse" is more often the semantic patient (object),
the model might struggle to visualize it as a semantic agent (subject). Our
results thus show that current models are biased to generate images aligned
with relations seen in training and provide important new data in the ongoing
debate on whether these text-to-image models employ abstract compositional
structure in a traditional sense, or rather, interpolate between relations
explicitly seen in the training data
Circuit Component Reuse Across Tasks in Transformer Language Models
Recent work in mechanistic interpretability has shown that behaviors in
language models can be successfully reverse-engineered through circuit
analysis. A common criticism, however, is that each circuit is task-specific,
and thus such analysis cannot contribute to understanding the models at a
higher level. In this work, we present evidence that insights (both low-level
findings about specific heads and higher-level findings about general
algorithms) can indeed generalize across tasks. Specifically, we study the
circuit discovered in Wang et al. (2022) for the Indirect Object Identification
(IOI) task and 1.) show that it reproduces on a larger GPT2 model, and 2.) that
it is mostly reused to solve a seemingly different task: Colored Objects
(Ippolito & Callison-Burch, 2023). We provide evidence that the process
underlying both tasks is functionally very similar, and contains about a 78%
overlap in in-circuit attention heads. We further present a proof-of-concept
intervention experiment, in which we adjust four attention heads in middle
layers in order to 'repair' the Colored Objects circuit and make it behave like
the IOI circuit. In doing so, we boost accuracy from 49.6% to 93.7% on the
Colored Objects task and explain most sources of error. The intervention
affects downstream attention heads in specific ways predicted by their
interactions in the IOI circuit, indicating that this subcircuit behavior is
invariant to the different task inputs. Overall, our results provide evidence
that it may yet be possible to explain large language models' behavior in terms
of a relatively small number of interpretable task-general algorithmic building
blocks and computational components
A Mechanism for Solving Relational Tasks in Transformer Language Models
A primary criticism towards language models (LMs) is their inscrutability.
This paper presents evidence that, despite their size and complexity, LMs
sometimes exploit a simple computational mechanism to solve one-to-one
relational tasks (e.g., capital_of(Poland)=Warsaw). We investigate a range of
language model sizes (from 124M parameters to 176B parameters) in an in-context
learning setting, and find that for a variety of tasks (involving capital
cities, upper-casing, and past-tensing) a key part of the mechanism reduces to
a simple linear update typically applied by the feedforward (FFN) networks.
These updates also tend to promote the output of the relation in a
content-independent way (e.g., encoding Poland:Warsaw::China:Beijing),
revealing a predictable pattern that these models take in solving these tasks.
We further show that this mechanism is specific to tasks that require retrieval
from pretraining memory, rather than retrieval from local context. Our results
contribute to a growing body of work on the mechanistic interpretability of
LLMs, and offer reason to be optimistic that, despite the massive and
non-linear nature of the models, the strategies they ultimately use to solve
tasks can sometimes reduce to familiar and even intuitive algorithms
Characterizing Mechanisms for Factual Recall in Language Models
Language Models (LMs) often must integrate facts they memorized in
pretraining with new information that appears in a given context. These two
sources can disagree, causing competition within the model, and it is unclear
how an LM will resolve the conflict. On a dataset that queries for knowledge of
world capitals, we investigate both distributional and mechanistic determinants
of LM behavior in such situations. Specifically, we measure the proportion of
the time an LM will use a counterfactual prefix (e.g., "The capital of Poland
is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia
and GPT2, the training frequency of both the query country ("Poland") and the
in-context city ("London") highly affect the models' likelihood of using the
counterfactual. We then use head attribution to identify individual attention
heads that either promote the memorized answer or the in-context answer in the
logits. By scaling up or down the value vector of these heads, we can control
the likelihood of using the in-context answer on new data. This method can
increase the rate of generating the in-context answer to 88\% of the time
simply by scaling a single head at runtime. Our work contributes to a body of
evidence showing that we can often localize model behaviors to specific
components and provides a proof of concept for how future methods might control
model behavior dynamically at runtime
Analyzing Modular Approaches for Visual Question Decomposition
Modular neural networks without additional training have recently been shown
to surpass end-to-end neural networks on challenging vision-language tasks. The
latest such methods simultaneously introduce LLM-based code generation to build
programs and a number of skill-specific, task-oriented modules to execute them.
In this paper, we focus on ViperGPT and ask where its additional performance
comes from and how much is due to the (state-of-art, end-to-end) BLIP-2 model
it subsumes vs. additional symbolic components. To do so, we conduct a
controlled study (comparing end-to-end, modular, and prompting-based methods
across several VQA benchmarks). We find that ViperGPT's reported gains over
BLIP-2 can be attributed to its selection of task-specific modules, and when we
run ViperGPT using a more task-agnostic selection of modules, these gains go
away. Additionally, ViperGPT retains much of its performance if we make
prominent alterations to its selection of modules: e.g. removing or retaining
only BLIP-2. Finally, we compare ViperGPT against a prompting-based
decomposition strategy and find that, on some benchmarks, modular approaches
significantly benefit by representing subtasks with natural language, instead
of code.Comment: Published at EMNLP 2023 (Main Conference). Source code:
https://github.com/brown-palm/visual-question-decompositio
NeuroSurgeon: A Toolkit for Subnetwork Analysis
Despite recent advances in the field of explainability, much remains unknown
about the algorithms that neural networks learn to represent. Recent work has
attempted to understand trained models by decomposing them into functional
circuits (Csord\'as et al., 2020; Lepori et al., 2023). To advance this
research, we developed NeuroSurgeon, a python library that can be used to
discover and manipulate subnetworks within models in the Huggingface
Transformers library (Wolf et al., 2019). NeuroSurgeon is freely available at
https://github.com/mlepori1/NeuroSurgeon